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(57) Abstract 

A fusion sequence having a carrier protein which is preferably an E. coli protein having a predicted solubility probability of at least 
90 % fused to a target heterologous peptide or protein, and a host cell (especially E. coli) transformed with, or having integrated into 
its genome, a DN A sequence comprising a DNA encoding a carrier protein as defined herein fused to the DNA sequence of a selected 
heterologous peptide or protein. This fusion sequence is under the control of an expression control sequence capable of directing the 
expression of a fusion protein in the cell. An objective of the present invention is to improve the purification process of recombinant fusion 
proteins by avoiding the initial expression of these fusion proteins in E. coli as insoluble inclusion bodies. The methods and compositions 
of the present invention permit the production of large amounts of heterologous peptides or proteins in a stable, soluble form in certain host 
cells which normally express limited amounts of such soluble peptides or proteins. The present invention produces fusion proteins which 
retain the desirable characteristics of a carrier protein (i.e., stability, solubility, and a high level of expression). 



BNSDOCID: <WO 9913091A1_I_> 



FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


SZ 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Faso 


GR 


Greece 




Republic of Macedonia 


TR 


Turkey 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


UZ 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


ZW 


Zimbabwe 


CI 


Cdte d*Ivoire 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Portugal 






cu 


Cuba 


KZ 


Kazakstan 


RO 


Romania 






cz 


Czech Republic 


LC 


Saint Lucia 


RU 


Russian Federation 






DE 


Germany 


LI 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singapore 







BNSDOCID: <WO 9913091A1_1_> 



WO 99/13091 PCT7US98/19101 
FUSION PROTEIN SYSTEMS 



BACKGROUND 

The present invention relates to recombinant methods of 
producing fusion proteins using E. coli proteins as carrier 

proteins for producing soluble fusion proteins. 
5 A major benefit resulting from the advent of recombinant DNA 

technology has been the large scale production of proteins of 
• medical or industrial importance. The simplest and most 
inexpensive means available for obtaining large amounts of 
"proteins by recombinant DNA technology is by expression of 
10 protein genes in bacteria (Georgiou and Valax, 1996) . The 
efficient synthesis of heterologous proteins in the bacterium 
Escherichia, coli has now become routine. However, when high 
expression levels are achieved, recombinant proteins are 
frequently expressed in E. coli as insoluble protein aggregates 
15 termed "inclusion bodies." Although initial purification of 
inclusion body material is relatively simple, the protein must 
be subsequently refolded into an active form, which is typically 
a cumbersome trial -and- error process (Georgiou and Valax, 1996) . 
Thus, it is much more desirable to express the recombinant 
20 protein in soluble form. 

A strategy to avoid inclusion body formation is to fuse the 
protein of interest (i.e. the target protein) to a protein known 
to be expressed at substantial levels in soluble form in E, coli 
(i.e. the carrier protein) . The most widely used carrier protein 
25 for the purpose of solubilization is thioredoxin from E. coli 
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(LaVallie et al . , 1993). A fusion protein system using 
thioredoxin for solubilization of target proteins is now being 
marketed by Invitrogen Corporation (Carlsbad, CA) . However, 
despite being touted for its ability to solubilize proteins in a 
5 fusion protein, thioredoxin does not always lead to formation of 

a fusion protein which is soluble at the normal E. coli growth 
temperature of 37°C. LaVallie et al . (1993) used thioredoxin as 
a carrier protein to express 11 human and murine cytokines. Of 
the 11 proteins, only 4 were expressed in soluble form as 

10 thioredoxin fusions at 37°C. The non-soluble fusion proteins with 

thioredoxin could be expressed in soluble form by reducing the 
growth temperature for expression to as low as 15°C. Several 
problems with the use of thioredoxin fusions for protein 
solubilization for any protein are apparent. For example, having 

15 to reduce the expression temperature to as low as 15°C may give 

unacceptable low rates of protein expression and slow growth 
rates. Also, due to the small size of thioredoxin (11.7 
kilodaltons) , fusions with larger proteins may not be soluble; 
that is, thioredoxin may not be large enough to compensate for the 

20 insolubility of a large protein. 

Two other E. coli fusion protein systems are widely used: 
fusions with E. coli maltose-binding protein (Guan et al . , 1988), 
which is 40 kilodaltons in size, and fusions with Schistosoma 
japonicvm glutathione S-transf erase (Smith and Johnson, 1988) , 26 

25 kilodaltons in size. Both of these systems were developed with 

the objective of enabling an affinity purification of the fusion 
protein to be carried out . Both systems tend to give soluble 
fusion proteins but fail to do so approximately 25% of 
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the time (New England Biolabs Tech Data Sheet, 1992; Smith and 
Johnson, 1988). Thus, thioredoxin fusions appear to be more 
soluble than either maltose-binding protein fusions or 
glutathione S- transferase fusions. An E. coli fusion protein 

system which could be reliably produced in a soluble form would 
be desirable. 

SUMMARY OF THE INVENTION 
In one aspect, the present invention provides a fusion 
sequence comprising a carrier protein comprising an E. coli 

protein having a predicted solubility probability of at least 90% 
fused to a target heterologous peptide or protein. The peptide 
or protein may be fused to the amino terminus of the soluble 
protein or the carboxyl terminus of the soluble protein. The 
fusion sequence according to this invention may optionally 
contain a linker peptide between the carrier protein and the 
selected peptide or protein. This linker provides, where needed, 
a selected cleavage site or a stretch of amino acids capable of 
preventing steric hindrance between the carrier protein and the 
target peptide or protein. 

In another aspect, the present invention provides a DNA 
molecule encoding the fusion sequence defined above in 
association with, and under the control of, an expression control 
sequence capable of directing the expression of the fusion 
protein in a desired host cell, in particular, E. coli. 

Still a further aspect of the invention is a host cell 
(especially E. coli) transformed with, or having integrated into 
its genome, a DNA sequence comprising a DNA encoding a carrier 
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protein as defined herein fused to the DNA sequence of a selected 
heterologous peptide or protein. This fusion sequence is 
desirably under the control of an expression control sequence 
capable of directing the expression of a fusion protein in the 
5 cell. 

In yet another aspect, there is provided herein a novel 
method for increasing the expression of soluble recombinant 
proteins. The method includes culturing under suitable 
conditions the above-described host cell to produce the fusion 
10 protein. 

In one embodiment of the method contemplated herein, if the 
resulting fusion protein is cytoplasmic, the cell can be lysed 
by conventional means to obtain the soluble fusion protein. More 
preferably in the case of cytoplasmic fusion proteins, the method 

15 includes releasing the fusion protein from the host cell by a 
method such as sonication or homogenation . The fusion protein 
is the purified by conventional means. In still another 
embodiment, if a secretory leader is employed in the fusion 
protein construct, the fusion protein can be recovered from a 

2 0 periplasmic extract or from the cell culture medium (using 
osmotic shock to release the protein) . As yet a further step in 
the above methods, the desired heterologous protein can be 
cleaved from fusion with the carrier protein by conventional 
means . 

25 Other aspects and advantages of the present invention will 

be apparent upon consideration of the following detailed 
description of preferred embodiments. 

In particular, the- objective of the present invention is to 
improve the purification process of recombinant fusion proteins 
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by avoiding the initial expression of these fusion proteins in E. 

coll as insoluble inclusion bodies by using E. coll carrier 

proteins having predicted solubility probabilities of at least 
90% . 

DESCRIPTION OF THE DRAWINGS 
Figure 1 is a graphical representation of the prediction of 
protein solubility probability in E, coll from CV-CV bar . If CV-CV har 

is positive, the protein is predicted to be insoluble. If CV-CV har 

is negative, the protein is predicted to be soluble. 

Figure 2 is a schematic drawing of the fusion protein 
containing the target protein desired to be solubilized. 

Figure 3, (A) is an SDS-PAGE and (B) is a western blot of 
soluble and insoluble cell fractions for fusion proteins 
comprising NusA, GrpE, BFR and thioredoxin as carrier proteins and 
hIL-3 as the heterologous protein. 

Figure 4 is an SDS-PAGE (A) and western blot (B) of soluble 
and insoluble fractions of 2x- YjgD/hIL-3 clones. 

Figure 5 is a western blot of whole cell induced cultures 
expressing hIL-3 fusion proteins with NusA, 2x-YjgD, GrpE and 
thioredoxin carrier proteins . 

Figure 6, (A) is an SDS-PAGE and (B) is a western blot of 
bovine growth hormone (bGH) expressed as a fusion to NusA. 

Figure 7, (A) is an SDS-PAGE and (B) is a western blot of 
human interferon-y (hlFN-y) expressed as a fusion to NusA. 
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DESCRIPTION OF THE INVENTION 
The methods and compositions of the present invention permit 
the production of large amounts of heterologous peptides or 
proteins in a stable; soluble form in certain host cells which 
5 normally express limited amounts of such soluble peptides or 
proteins. The present invention produces fusion proteins which 
retain the desirable characteristics of a carrier protein (i.e., 
stability, solubility, and a high level of expression) . 

According to the present invention, the DNA sequence 
10 encoding a heterologous (target) peptide or protein selected for 
expression in a recombinant system is desirably fused directly 
or indirectly to a DNA sequence encoding a carrier protein as 
defined herein for expression in the host cell . 

This invention is directed toward improving the process of 
15 producing proteins by recombinant DNA technology in bacteria, 
either in the laboratory or in larger scale facilities. The 
invention would be very useful for the production of those 
heterologous proteins in bacteria that are normally insoluble 
when expressed in the bacteria. This invention could be used by 
20 laboratory researchers in biotechnology who need to express 
recombinant proteins substantially in soluble form in bacterial 
cells. The invention could also be used in large scale 
industrial processes for the efficient production of recombinant 
proteins that are generally insoluble when expressed by 
25 themselves in bacteria. 

The selection of a carrier protein for the fusion protein 
which results in a more soluble fusion protein is based upon a 
revised version of the quantitative model developed by Wilkinson 
and Harrison (1991) , for prediction of the solubility of 
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recombinant proteins expressed in E. coll at 37°C. This model, 
which was based on . data in the literature on 81 proteins expressed 
in E. coll, showed that the probability of a given protein being 

soluble is primarily dependent upon two parameters: fraction of 
5 turn- forming residues (Pro, Asp, Gly, and Ser) , and the absolute 

value of the charge average minus 0.03. 

The revised Wilkinson-Harrison solubility model involves 
calculating a canonical variable (CV) or composite parameter for 

the protein for which solubility is being predicted. The two- 
10 parameter model is defined as: 



^tz ^ ( N+G+p+s) x 



(R+K) - (D+E) rt _ 
: — -0 . 03 



n 



(I) 



where n = number of amino acids in the protein; 

N, G, P, S = number of residues of Asn, Gly, Pro, or Ser, 
respectively; 

15 R, K, D, E = number of residues of Arg, Lys , Asp, or Glu, 

respectively ; 
A a = 15.43; and 
Y 2 = -29.56. 

The probability of the protein being soluble is determined based 
20 on the parameter CV-CV har , where CV bar is the discriminant, equal to 

1.71. If CV-CV bar is positive, the protein is predicted to be 

insoluble, while if CV-CV bar is negative, the protein is predicted 

to be soluble. The probability of solubility or 
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insolubility can be determined from CV-CV sar using the graph in 

Figure 1 or from the following equation: 

Probability of Solubility or Insolubility = 0.4934 + 0.276 |CV-CV,J -O.Q392(CV-CV bil ) 1 (II ) 
Four E. coli proteins, NusA protein, GrpE protein, 
bacterioferritin (BFR) and 2X-YjgD protein (two YjgD proteins 
connected at the carboxy terminus of one and at the amino 
terminus of the other) , were selected for further study for use 
as carrier proteins based upon the new two-parameter version of 
this model . The properties of fusion proteins comprising carrier 
proteins of the present invention were compared with thioredoxin 
alone and as a fusion protein as shown in Table 1. 
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Protein 


MW 


Amino Acid 


Probability of 


( kDa ) 


Length 


Solubility or Insolubility* 


2X-YjgD 


31 .2 


276 


>97% soluble 


NusA 


55 .0 


495 


95% soluble 


BFR 


18.5 


158 


95% soluble 


GrpE 


21.7 


197 


92% soluble 


thioredoxin 


11.7 


109 


73% soluble 


hIL-3 


15.1 


133 


73% insoluble 


bGH 


21.6 


189 


85% insoluble 


hIFN-Y 


17.1 


146 


96% insoluble 



10 Fusion 



Protein 



2X-YjgD/hIL-3 


47.3 


417 


>97% soluble 


NusA/hIL-3 


70.6 


634 


86% soluble 


NusA/bGH 


77.4 


690 


80% soluble 


NusA/hlFNy 


72.7 


647 


7 9% soluble 


GrpE/hIL-3 


37.3 


336 


72% soluble 


BFR/hIL-3 


34.1 


297 


72% soluble 


thio/hIL-3 


26.8 


248 


54% insoluble 



Table 1 . Predicted Solubilities of Caxrier , Target Proteins , and 
20 Carrier /Target Fusion Proteins. The carrier proteins have relatively high 

solubility probabilities while the target proteins (hIL-3, bGH, and hlFN-y) do 
not. Note: the ILe-Glu-Gly-Arg sequence for factor Xa cleavage and the amino 
acids Thr-Gly created by an Agel restriction site are included in the MW and 
solubility calculations. * See Equations I and II. 

25 Thus, each of the four selected proteins alone or as a 

fusion protein are predicted to be more soluble than thioredoxin 
when expressed in E. coli at 37°C. 

All E. coli proteins that are predicted by the model to have 

90% or greater probability of being soluble and which comprise 

30 at least 10 0 amino acids are within the class of carrier proteins 

which are contemplated as falling within the present invention. 

Examples are shown in Table 2 . 

Table 2. Escherichia coli proteins of 100 amino acids or greater in length in the SWISS-PROT protein databank 
which have a calculated CV-CVbar value of -2. 10 or less, which are predicted by the two parameter solubility model of 
3 5 Wilkinson and Harrison to have a solubility probability of 90% or greater when expressed in the £. coli cytoplasm. 
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Amino Acids used for C V-CV har calculation 



SWISS-PROT 
Protein Name 


Total No. of 
Amino Acids 


cv-cv.„ 


Are 


Asn 


Asp 


Glu 


GIv 


Lvs 


Pro 


Ser 


RPSD ECOLI 


613 


-2.48 


46 


19 


54 


71 


24 


34 


19 


29 


FTSY_ECOLI 


497 


-3.15 


18 


11 


22 


79 


30 


33 


22 


15 


AMY2 ECOLI 


495 


-2.17 


20 


21 


44 


40 


40 


20 


23 


14 


NUSA ECOLI 


495 


-2.62 


33 


19 


42 


56 


28 


25 


14 


15 


YRFI ECOLI 


294 


-2.67 


12 


16 


23 


26 


19 


7 


14 


7 


MAZG ECOLI 


263 


-3.03 


21 


7 


21 


32 


10 


12 


7 


7 


S3AD ECOLI 


263 


-2.38 


16 


5 


18 


27 


13 


8 


14 


12 


SSEB ECOLI 


261 


-2.24 


9 


6 


12 


34 


17 


13 


15 


14 


YCHA ECOLI 


252 


-2.42 


13 


11 


18 


26 


9 


8 


12 


15 


YAGJ ECOLI 


243 


-2.26 


14 


5 


19 


26 


12 


15 


9 


10 


YFBN ECOLI 


238 


-2.65 


15 


10 


19 


23 


9 


13 


4 


3 


NARJ ECOLI 


236 


-2.44 


13 


4 


19 


19 


11 


8 


9 


11 


NARW ECOLI 


231 


-2.36 


13 


6 


21 


20 


11 


g 


10 


13 


YECA ECOLI 


221 


-2.54 


8 


6 


13 


28 


12 


11 


14 


11 


CHEZ ECOLI 


214 


-2.49 


14 


5 


22 


15 


7 


6 


9 


13 


GRPE ECOLI 


197 


-2.34 


11 


8 


13 


26 


8 


13 


9 


7 


SLYD ECOLI 


196 


-2.98 


4 


6 


19 


17 


29 


6 


5 


5 


YJAG ECOLI 


196 


-3.33 


11 


5 


12 


25 


9 


6 


5 


10 


YIEJ ECOLI 


195 


-2.77 


6 


6 


16 


21 


17 


10 


8 


7 


YGFB ECOLI 


194 


-3.22 


4 


8 


18 


16 


17 


4 


9 


8 


YJDC_ECOLI 


191 


-2.13 


15 


4 


15 


15 


7 


6 


7 


5 


YCDY_ECOLI 


184 


-3.12 


9 


3 


12 


21 


8 


3 


11 


12 


AADB ECOLI 


177 


-2.65 


13 


2 


13 


20 


14 


4 


9 


5 


FLAV ECOLI 


175 


-4.09 


4 


4 


20 


17 


15 


9 


5 


5 


FLAW ECOLI 


173 


-3.32 


2 


5 


16 


16 


16 


8 


6 


7 


YCED_ECOLI 


173 


-2.57 


7 


4 


12 


19 


5 


8 


12 


10 


YFHE ECOLI 


171 


-2.62 


13 


2 


14 


16 


3 


8 


3 


9 


ASR_ECOLI 


169 


-2.62 


14 


8 


0 


5 


5 


18 


10 


9 


YGGD ECOLI 


169 


-2.76 


4 


6 


17 


9 


5 


8 


5 


9 


YHBS_ECOLI 


167 


-2.39 


11 


3 


15 


13 


16 


2 


6 


6 


FTN_ECOLI 


165 


-2.21 


4 


8 


7 


19 


5 


9 


4 


12 


MENG ECOLI 


161 


-2.44 


8 


7 


15 


14 


21 


2 


3 


7 


YBEL ECOLI 


160 


-2.13 


14 


4 


9 


22 


7 


7 


5 


8 


BFR_ECOLI 


158 


-2.68 


9 


10 


14 


18 


11 


9 


1 


4 


SMG ECOLI 


157 


-5.53 


7 


6 


12 


23 


5 


3 


4 


3 


HYCI ECOLI 


156 


-2.55 


5 


7 


14 


13 


16 


4 


10 


2 


SECB ECOLI 


155 


-2.18 


4 


7 


8 


13 


9 


3 


7 


8 


YBEY ECOLI 


155 


-4.29 


3 


4 


9 


23 


8 


5 


8 


9 
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Table 2 cont... 





SWISS-PROT 


TotaB No. of 






















Protein Name 


Acsbbbiio Acids 


CV-CV* 


Am? 


Asm 




Glu 


GJy 


ILvs 


Pro 


Ser 




ELAA ECOLI 


153 


-2.39 


6 


4 


12 


12 


10 


5 


6 


7 


5 


YFJX ECOLI 


152 


-2.37 


6 


5 


11 


11 


11 


1 


7 


8 




MIOC ECOLI 


146 


-2.23 


2 


4 


10 


16 


15 

1 w 


7 


7 


10 




YJGD ECOLI 


138 


-9.37 


3 
%j 


3 

w 


22 


25 


Q 
O 


•a 


A 


O 
O 




HYFJ ECOLI 


137 

1 W # 


-2 51 


7 
# 


-1 

1 


7 


1** 


C 

w 


4 




Q 
O 




RL16 ECOLI 


136 


-2 45 


14 


9 


0 

w 


7 


1^ 


1ft 


7 
f 


9 


10 


RS6 ECOLI 


135 

1 w *J 


-3 17 


19 


A 


10 


9n 


c 

O 


O 


c 
o 


A 






13^ 


-2 17 


10 


O 


14 


7 


7 


0 
O 


•4 
1 


Q 
O 






19Q 

1 <£. 57 




4 
1 


O 


1 i 


1 / 


Q 
O 


O 


0 


*1 9 
1 4 




trds pnoi 1 


129 


W GO 
"'♦.DO 


ft 


O 


1ft 

IO 


1 r 


0 
0 


4 


1 u 


4 










4 






9,4 


Q 
O 


1 




0 

£. 


1 R 
X -J 


RS19 ppoi I 


12*^ 
I ^w 


-5 2^ 


1*\ 


c 
0 


O 




i i 
1 1 


•1 Q 
T O 


•7 
f 


D 
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Preferably, the carrier protein of the present invention has a 
3 5 solubility probability of 90% or greater as determined from the revised 
Wilkinson-Harrison solubility model, shown herein, wherein the parameter 
CV-CV bar is negative. 
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The present invention comprises a method of purifying a 
heterologous protein wherein the heterologous protein is expressed 
from a fusion gene comprising a gene encoding the heterologous 
protein which optionally is linked via a linker gene to a gene 
5 encoding a carrier protein as described elsewhere herein. Linker 
genes are well known to those of ordinary skill in the art. As 
noted, an example is the linker gene which encodes the Ile-Glu-Gly- 
Arg linker which is cleaved by Factor X 3 protease. Other linkers 
may be cleaved by trypsin, enterokinase , collagenase and thrombin 

10 for example. Other linkers and their linked genes will be readily 
apparent to those of ordinary skill in the art. Alternatively, the 
cleavage site in the linker may be a site capable of being cleaved 
upon exposure to a selected chemical, e.g., cyanogen bromide, 
hydroxyl amine , or low pH. 

15 A schematic drawing of a fusion protein contemplated by the 

present invention which contains the carrier protein and the 
insoluble heterologous protein is shown in Figure 2 . The carrier 
protein and the target protein are connected by a cleavable linker 
sequence which can be cleaved by an enzyme to release the target 

20 protein. An example of a cleavable linker sequence is, as noted 
above, Ile-Glu-Gly-Arg, which is cleaved at the carboxy terminus of 
Arg by Factor X a protease. 

Cleavage at the selected cleavage site enables separation of 
the heterologous protein or peptide from the fusion protein to 

25 yield the mature heterologous peptide or protein. The mature 
peptide or protein may then be obtained in purified form, free from 
any polypeptide fragment of the carrier protein to which it was 
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previously linked. The cleavage site, if inserted into a linker 
useful in the fusion sequences of this invention, does not limit 
this invention. Any desired cleavage site, of which many are known 
in the art, may be used for this purpose. 
5 The optional linker sequence of a fusion sequence of the 

present invention may serve as a simple amino acid sequence of a 
sufficient length to prevent any steric hindrance between the 
carrier molecule and the selected heterologous peptide or protein. 
Whether or not such a linker sequence is necessary will depend 
10 upon the structural characteristics of the selected heterologous 
peptide or protein and whether or not the resulting fusion protein 
is useful without cleavage. Alternatively, where the mature 
protein sequence may be naturally cleaved, no linker may be needed. 
In one embodiment therefore, the fusion sequence of this 
15 invention contains a carrier sequence, as defined herein, fused 
directly at its amino or carboxyl terminal end to the sequence of 
the selected peptide or protein. The resulting fusion protein is 
thus a soluble cytoplasmic fusion protein. In another embodiment, 
the fusion sequence further comprises a linker sequence interposed 
20 between the carrier sequence and the selected peptide or protein 
sequence. This fusion protein is also produced as a soluble 
cytoplasmic protein. The cytoplasmic fusion protein can be 
purified by conventional means. 

The present invention comprises a method of producing and 
25 purifying a heterologous protein in E. cold. An E. coli 
transformed with recombinant plasmid comprising a gene for a fusion 
protein is grown at a suitable temperature. Suitable temperatures 
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may be in a range of from about 20°C to about 40°C, more 
preferably, the range is about 23°C to' about 38°C, still more 
preferably, from about 30°C to about 37°C / and still more 
preferably from about 35°C to about 37°C. 
5 The cells are induced to produce the fusion protein and the 

cells are harvested and treated to obtain the fraction containing 
the soluble fusion protein, which is then treated to isolate the 
heterologous protein from the carrier protein in a manner well- 
known to one of ordinary skill in the art. The carrier protein 

10 used is one having a solubility probability of 90% or greater as 
predicted by the two-parameter Wilkinson-Harrison model described 
herein. The present invention- also contemplates that the fusion 
gene and protein may be made up of more than one carrier protein 
gene or carrier protein, including more than one copy of a carrier 

15 or more than one specific type of carrier. 

In an especially preferred version of the invention, the 
entire fusion protein expressed in the induced E. coll has a 

predicted solubility probability of 65% or greater as determined by 
the two-parameter Wilkinson-Harrison model described herein. More 

2 0 preferably the entire fusion protein has a solubility probability 
of 75% or greater, and even more preferably has a solubility 
probability of 90% or greater and thus the fusion protein is 
substantially soluble, that is it is at least 65%, 75% or most 
preferably at least 90% soluble. 

25 Where used herein the term carrier protein is also meant to 

include alternate versions of the carrier protein which comprise 
substitutions or deletions of amino acid residues which are not 
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critical or required for the normal folding of the protein such 
that the alternate versions still have a solubility probability of 
90% or greater. 

The present invention is also directed to a method of 
5 obtaining expression in soluble form of a recombinant protein by 
the separate co- expression of NusA protein and the heterologous 
protein in the cell. One way to co-express NusA is to transform E. 

coli cells with two plasmids, one that comprises the gene for NusA 

protein and the other that comprises the gene for a heterologous 
10 protein whose soluble expression is desired (e.g., hIL-3). Both 
proteins are induced so that both NusA and the desired protein are 

-produced in the cytoplasm of E. coli . 

Solubility of the desired protein is higher in cells with 
overexpression of the NusA gene than in cells without 
15 overexpression of the NusA gene. NusA appears to cause an increase 
in solubility of the desired protein due to its transcriptional 
pausing activity. It- is anticipated that other proteins which 
cause transcriptional pausing will also be effective in causing an 
increase in the solubility of expressed proteins. In this regard, 
20 the present invention comprises a method of increasing solubility 
of expressed proteins by expressing the genes in an expression 
system such as E . coli comprising the NusA gene, or other gene, 
which encodes a protein which has transcriptional pausing activity. 
DNA sequences which hybridize to the sequences for E . coli 

25 carrier proteins defined herein under either stringent or relaxed 
hybridization conditions also encode carrier proteins for use in 
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this invention. An example of one such stringent hybridization 
condition is hybridization at 4XSSC at 65°C, followed by a washing 
in 0 . 1XSSC at 65°C for an hour. -Alternatively, an exemplary 
stringent hybridization condition is in 50% formamide, 4XSSC at 
5 42°C. Examples of non- stringent hybridization conditions are 4XSSC 
at 50°C with 30-40% formamide at 42°C. The use of all such carrier 
protein sequences are believed to be encompassed in this invention. 

Construction of a fusion sequence of the present invention, 
which comprises the DNA sequence of a selected heterologous peptide 

10 or protein and the DNA sequence of a carrier protein sequence, as 
defined herein, employs convention genetic engineering techniques 
[see, Sambrook et al . , Molec u la r Cloning, A Laboratory tflanual. Cold 
Spring Harbor Laboratory, Cold Spring Harbor, N.Y. (1989)] . Fusion 
sequences may be prepared in a number of different ways. For 

15 example, the selected heterologous protein may be fused to the 
amino terminus of the carrier molecule. Alternatively, the 
selected protein sequence may be fused to the carboxyl terminus. of 
the carrier molecule. 

This fusion of a desired heterologous peptide or protein to 

20 the carrier protein increases the solubility of the heterologous 
peptide or protein. At either the amino or carboxyl terminus, the 
desired heterologous peptide or protein is fused in such a manner 
that the fusion does hot destabilize the native structure of either 
protein. Furthermore, more than one copy of the DNA coding 

25 sequence of the carrier protein (e.g., at lest two or more) may be 
used in the fusion gene for increasing solubility of the resulting 
fusion protein. 
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This invention is not limited to any specific type of 
heterologous peptide or protein. A wide variety of heterologous 
(i.e., foreign in reference to the host genome) genes or gene 
fragments are useful in forming the fusion sequences of the present 
5 invention. Any selected, desired DNA sequence could be used. 
While the compositions and methods of this invention are most 
useful for peptides or proteins which are not expressed, expressed 
in inclusion bodies, or expressed in very small amounts in 
bacterial and yeast hosts, the selected heterologous peptides or 

10 proteins can include any peptide or protein useful for human or 
veterinary therapy, diagnostic or research applications in any 
expression system. For example, hormones, cytokines, growth or 
inhibitory factors, enzymes, modified or wholly synthetic proteins 
or peptides can be produced according to this invention in 

15 bacterial, yeast, mammalian or other eukaryotic cells and 
expression systems suitable therefor. However, especially 

preferred embodiments of the invention comprise heterologous 
proteins and genes thereof which are not produced in soluble form 
at normal growth temperatures (e.g., 37°C) when fused to 

20 thioredoxin, maltose-binding protein, or glutathione s- transferase 
for example . 

A variety of DNA molecules incorporating the above -described 
fusion sequences may be constructed for expressing the selected 
heterologous peptide or protein according to this invention. At a 
25 minimum, a desirable DNA sequence according to this invention 
comprises a fusion sequence described above, in association with, 
and under the control of, an expression control sequence capable of 
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directing the expression of the fusion protein in a desired host 
cell. For example, where the host cell is an E. coli strain,' the 

DNA molecule desirably contains a promoter which functions in E. 

coli, a ribosome binding site, and optionally, a selectable marker 

gene and an origin of replication if the DNA molecule is extra- 
chromosomal. Numerous bacterial expression vectors containing 
these components are known in the art for bacterial expression, and 
can easily be constructed by standard molecular biology techniques. 
Similarly known yeast and mammalian cell vectors and vector 
components may be utilized where the host cell is a yeast cell or 
a mammalian cell . 

The DNA molecules containing the fusion sequences may be 
further modified to contain different codons to optimize expression 
in the selected host cell, as is known in the art. 

These DNA molecules may additionally contain multiple copies 
of the carrier protein- encoding DNA sequence, with the gene 
encoding the heterologous protein fused to only one of the repeated 
carrier DNA sequences, or with the gene encoding the heterologous 
protein fused to all copies of the DNA of the carrier such that 
the resulting fusion protein comprises two or more copies of the 
carrier protein. 

In an alternative version of the invention, more than one type 
of carrier protein of the type contemplated herein may be used in 
a single fusion protein, wherein the fusion protein is encoded by 
at least two different genes each encoding a different carrier 
protein. 
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Host cells suitable for the present invention are preferably 
bacterial cells. For example, the various strains of E . coli 
(e.g., HB101, NN522, JM109, W3110, and the JM105 strain used in the 
following examples) are well-known as host cells in the field of 
biotechnology 

To produce the fusion protein of this invention, the host cell 
is either transformed with, or has integrated into its genome, a 
DNA molecule comprising a DNA sequence encoding a carrier protein 
fused to the DNA sequence of a selected heterologous peptide or 
protein under the control of an expression control sequence capable 
of directing the expression of a fusion protein. The host cell is 
then cultured under known conditions suitable for fusion protein 
production. If the fusion protein accumulates in the cytoplasm of 
the cell it may be released by conventional bacterial cell lysis 
techniques and purified by conventional procedures including 
selective precipitations and column chromatographic methods. If a 
secretory leader is incorporated into the fusion molecule 
substantial purification is achieved when the fusion protein is 
secreted into the periplasmic space or the growth medium. 

A protein secreted into the periplasmic space may be 
selectively released from the cell by osmotic shock or freeze/thaw 
procedures. Although final purification is still required for most 
purposes, the initial purity of fusion proteins in preparations 
resulting from these procedures is generally superior to that 
obtained in conventional whole cell lysates, reducing the number of 
subsequent purification steps required to attain homogeneity. In 
a typical osmotic shock procedure, the packed cells containing the 
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fusion protein are resuspended on ice in a buffer containing EDTA 
and having a high osmolarity, usually due to the inclusion of a 
solute, such as 20% w/v sucrose, in the buffer which cannot readily 
cross the cytoplasmic membrane. During a brief incubation on ice 
the cells plasmolyze as water leaves the cytoplasm down the osmotic 
gradient. The cells are then switched into a buffer of low 
osmolarity, and during the osmotic re-equilibration the contents of 
the periplasm are released to the exterior. A simple 

centrif ugation following this release removes the majority of 
bacterial cell -derived contaminants from the fusion protein 
preparation. Alternatively, in a freeze/ thaw procedure the packed 
cells containing the fusion protein are first suspended in a buffer 
containing EDTA and are then frozen. Fusion protein release is 
subsequently achieved by allowing the frozen cell suspension to 
thaw. The majority of contaminants can be removed as described 
above by a centrif ugation step. The fusion protein is further 
purified by well-known conventional methods. 

The resulting fusion protein is stable and soluble, often with 
the heterologous peptide or protein retaining its bioactivity. The 
heterologous peptide or protein may optionally be separated from 
the carrier protein by cleavage, as discussed elsewhere herein. 

The production of fusion proteins according to this invention 
reliably improves solubility of desired heterologous proteins and 
enhances their stability to proteases in the expression system. 
This invention also enables high level expression of certain 
desirable therapeutic proteins, which are otherwise produced at low 
levels in bacterial host cells. 
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The following examples illustrate embodiments of the present 
invention, but are not intended to limit the scope of the 
disclosure . 

Tramples 

5 As noted above, four carrier proteins (NusA, GrpE, BFR, and 

YjgD-2 copies) were tested. Genes for fusion proteins were 
constructed in expression plasmid pKK223-3 (Pharmacia Biotech) 
downstream of the tac promoter. E. coli strain JM105 was 
transformed with the recombinant plasmids, and the fusion proteins 

10 were expressed by induction of the cells growing at 37°C by the 
addition of 1 mM isopropyl- P-D- thiogalactoside (IPTG) . The cells 
were harvested 3 hours from the start of induction. The cells were 
sonicated after harvesting to break the cell walls, and the cell 
solids material was separated by centrif ugation . Construction of 

15 such plasmids and transformed strains comprising other carrier 
proteins and target proteins and growth thereof is considered to be 
within the skill of a person of ordinary skill in the art and 
therefore it is not considered necessary to provide further 
methodological details . 

2 0 All of the carrier proteins (NusA, GrpE, 2X-YjgD and BFR) were 

tested with human interleukin- 3 (hlL-3) as the heterologous 
protein. hIL-3 previously has been found to be expressed in solid 
form in "inclusion bodies" in E. coli (Donahue et al . , 1988; 
Lutsenko, 1992). hIL-3 has a molecular weight of 15.1 kilodaltons 

25 and is predicted to be insoluble by the revised Wilkinson-Harrison 
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solubility model (see Table 1) . The NusA carrier protein was 
additionally tested. in fusion proteins with bovine growth hormone 
(bGH) and human interferon-y (hlFN-y) . Both bGH and hlFN-y were 
previously expressed in inclusion bodies in E. coli at 37°C (George 

5 et al., 1985; Haelewyn and De Ley, 1995). bGH and hlFN-y have 
molecular weights of. 21.6 kilodaltons and 17.1 kilodaltons , 
respectively, and are both predicted to be insoluble by the revised 
Wilkinson-Harrison solubility model (Table 1) . For convenience, 
the linker Ile-Glu-Gly-Arg was used to link the carrier protein to 
10 the heterologous protein in all of the fusion proteins studied. 

Both the cell insoluble material and the clarified supernatant 
were analyzed for proteins by sodium dodecyl sulf ate-polyacrylamide 
gel electrophoresis (SDS-PAGE) . Equal portions of cell lysate, 
soluble fraction, and insoluble fraction were loaded onto the gel. 
15 The SDS-PAGE results are shown in Figures 3 and 4. 

Figure 3 shows NusA, GrpE , BFR, and thioredoxin fusion 
proteins containing hlL-3. Equal portions of cell lysate, soluble 
fraction, and insoluble fraction were loaded. Column (m) 
represents markers, (u) - the uninduced whole cell lysate, (i.) - 
20 the induced whole cell lysate, (sol) - the soluble fraction, and 
(ib) - the inclusion body fraction. Fusion proteins were expressed 
from plasmid pKK223-3 under control of the tac promoter in E. coli 
JM105 at 37°C. Cells were induced with 1 mM IPTG and grown for 3 
h post- induction. The western blot was probed with mouse anti-hlL- 
25 3 monoclonal antibody and visualized using chemiluminescence . 

Percentage solubility is based on the western blots (density of 
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soluble band divided by the density of the soluble plus insoluble 
bands) . 

Figure 4 shows SDS-PAGE of soluble and insoluble cell 
fractions for 2X-YjgD/hIL-3 clones. SDS-PAGE (A) and western blot 
(B) of 2X-YjgD/hIL-3 fusion protein. Equal portions of cell 
lysate, soluble fraction, and insoluble fraction were loaded. 
Column (m) represents markers, column (u) represents uninduced 
whole cell lysate, column (1) represents induced whole cell lysate, 
column (sol) represents soluble fraction, and column (ib) 
represents inclusion body fraction. Fusion proteins were expressed 
from plasmid pKK223-3 under control of the tac promoter in E . coll 

JM105 at 37°C. Cells were induced with 1 mM IPTG and grown for 3 
h post- induction. The western blot was probed with mouse anti-hlL- 
3 monoclonal antibody and visualized using chemiluminescence . 

These results indicate that the NusA/hIL-3, GrpE/hIL-3, BFR- 
/hIL-3, and 2X- Yj gD/hIL-3 fusion proteins were expressed 
substantially in the soluble fraction (97%, 71%, 47% and 100%, 
respectively) , while the thioredoxin/hIL-3 fusion protein was 
expressed primarily in the insoluble fraction (8% solubility) . It 
is noteworthy that the level of expression of the NusA/hIL-3 fusion 
protein is much higher than for the other fusion proteins. Two 
YjgD proteins were linked together in the 2X- Yj gD/hIL-3 fusion 
protein because it was found that when only one YjgD protein was 
used, the YjgD/hIL-3 fusion protein was mostly insoluble. Two or 
more copies of the solubilizing protein may be used in the present 
invention where a single copy does . not result in substantial 
solubilization of the fusion protein. The YjgD protein is 
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advantageous to use in a fusion protein since it is a highly acidic 
protein (isoelectric point, or pi, of 3.6) and thus could be 
conveniently and economically purified by ion exchange 
chromatography . 

5 Figure 5 shows a western blot of whole cell induced cultures 

expressing hIL-3 fusion proteins with NusA, 2X-YjgD, GrpE, and 
thioredoxin. Positions of prestained molecular weight markers are 
shown at left. Lane 1, NusA/hIL-3 . Lane 2, 2X- YjgD/hIL- 3 . Lane 
3, GrpE/hIL-3. Lane 4, thioredoxin/hIL- 3 . The blot was probed 
10 with a mouse monoclonal antibody that neutralizes hIL-3 activity 
and was visualized using chemiluminescence . Fusion proteins were 
expressed from plasmid pKK223~3 under control of the tac promoter 

in E. coli JM105 at 37 °C. Cells were induced with 1 mM IPTG and 

grown for 3 h post- induction . 

15 The blot was probed with mouse anti-hIL-3 monoclonal antibody 

and visualized using chemiluminescence. These results show that 
the hIL-3 reactivity is at the molecular weight position for the 
corresponding fusion protein, indicating that each fusion protein 
contains hIL-3. The hIL-3 reactivity is significantly larger and 

20 denser for the NusA/hIL-3 fusion protein than for the other fusion 
proteins., which indicates that more hIL-3 is being expressed in the 
NusA/hIL-3 fusion than in the other fusions. The small band 
between 35 and 50 kDa for NusA/hIL-3 indicates that a slight amount 
of cleavage of this fusion protein could be occurring. 

25 Table 3 shows the biological activity of hIL-3 in several 

fusion proteins. In order to determine if the hIL-3 present in 
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each of the fusion proteins was biologically active, indicating 
that hIL-3 was properly folded, a cell proliferation assay was 
performed on each fusion protein in the crude cell lysates 
(Table 3) . hIL-3 activity was found to be present in all fusion 
5 proteins with the highest amount of native activity present in the 
NusA/hIL-3 protein. It should be noted that reductions in native 
activity are most likely due to the presence of the carrier 
protein, which may interfere with the receptor binding properties 
of hIL-3. hIL-3 cell proliferation assays using a TF-1 cell line 
10 were performed by Dr. Robert House at the Illinois Institute of 
Technology Research Institute (Chicago) . 

hIL-3 (jig/ml) hIL-3 (pg/ml) Percent 

hIL-3 Fusion Est. by SDS-PAGE and Cell Proliferation Native 

Protein BCA Total Protein Assay Assay Activity 

NusA/hIL-3 775 5~0 ~ 67 
15 GrpE/hIL-3 5.2 0.3 6 
BFR/hIL-3* 8.7 1.0 12 
Thioredoxxn/hIL-3^ 7^5 3jS 48 

Expressed at 23°C *:c snsure adequate levels of hIL-3 for the ceii prolif era'-.: zr. assay. All c^her fusion 
proteins were grown an 37 °C. 

20 Table 3. Activity of Recombinant Human Interleukin-3 . The activity of hIL-3 was 

determined in the soluble crude cell lysates of various fusion protein constructs. 

The SDS-PAGE and western blotting results for the NusA/bGH and 
NusA/IFN-v fusion proteins in the cell lysate, the soluble 
fraction, and the insoluble fraction are shown in Figures 6 and 7, 
25 respectively. 

Figure 6 shows (A) an SDS-Page and (B) a western blot of 
bovine growth hormone (bGH) expressed as a fusion to NusA. Equal 
portions of cell lysate, soluble fraction, and insoluble fraction 
were loaded. Column (m) indicates molecular weight markers, (u) 
3 0 indicates uninduced, (i) indicates induced, (sol) indicates soluble 
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fraction, and (ib) indicates insoluble fraction. ■ The NusA/bGH 
fusion protein was expressed from plasmid pKK223-3 under control of 
the tac promoter in E. coli JM105 at 37°C. Cells were induced with 

1 mM 1PTG and grown for 3 h post induction. The western blot was 
5 probed with rabbit anti-bGH polyclonal antibody and visualized 
using chemiluminescence . The percentage solubility based on the 
western blot was 89%. 

Figure 7 shows (A) an SDS-PAGE and (B) a western blot of human 
interferon-y (hlFN-y) expressed as a fusion to NusA. Equal 

10 portions of cell lysate, soluble fraction, and insoluble fraction 
were loaded. Column (m) indicates molecular weight markers, (u) 
indicates uninduced, (i) indicates induced, (sol) indicates soluble 
fraction, and (ib) indicates insoluble fraction. The NusA/hlFN-y 
fusion protein was expressed from plasmid pKK223-3 under control of 

15 the tac promoter in E. coli JTVI105 at 37°C. Cells were induced with 

1 mM IPTG and grown for 3 h post - induction . The western blot was 
probed with rabbit anti-hlFN-y monoclonal antibody and visualized 
using chemiluminescence. The percentage solubility based on the 
western blot is 87%. 

20 It can be seen that both of these fusion proteins are almost 

completely soluble. The Western blots indicate that the percentage 
solubilities of the NusA/bGH and NusA/hlFN- y fusion proteins were 
89% and 87%, respectively. 

Isolation of the fusion proteins and of the heterologous 

25 proteins to complete purity is within the ability of a person of 
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ordinary skill in the art. Therefore it is not deemed necessary to 
further describe said isolation and purification processes herein. 

These results agreed very well with the SDS-PAGE results in 
Figures 3 and 4 for the cell fractions containing overexpressed 
5 fusion protein. In addition, Western blots were performed on cell 
fractions for two other fusions induced at 23 °C for 3 hours: 
bacteriof erritin-IL-3 and YjgD- IL- 3 . Bacteriof erritin (denoted by 
BFR in Table 2) has a molecular weight of 18.5 kDa, and YjgD has a 
molecular weight of 15.6 kDa. For both of these fusion proteins, 

10 the IL-3 was predominantly in the soluble fraction (the amount in 
the insoluble fraction was approximately one- fifth of that in the 
soluble fraction for BFR and 0% for YjgD) . Also, according to 
Western blotting results, at 37°C the bacteriof erritin- IL- 3 fusion 
protein was partially soluble (approximately 50% of the total was 

15 soluble) , compared to very little solubility for the thioredoxin- 
IL-3 fusion protein at this same temperature. 

Bacteriof erritin by itself is predicted to have a solubility 
probability of 95% according to the revised solubility model of 
Wilkinson and Harrison. Thus, the revised model of Wilkinson and 

20 Harrison correctly predicts that bacteriof erritin has better 
solubilizing ability than thioredoxin. Bacteriof erritin and NusA 
have identical predicted solubilities, and it is believed that 
NusA-IL-3 is more soluble than bacteriof erritin- IL- 3 at identical 
induction temperatures because NusA is a considerably larger 

25 protein. An advantage of the use of bacteriof erritin in fusion 
proteins is that the cells containing the fusion protein are 
reddish in color because of the presence of the bacteriof erritin . 
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Thus, screening for the presence of bacteriof erritin in cells is 
very simple. 

For the fusion proteins disclosed in this invention, 
expression in E. coli in the temperature range of room temperature 

5 (20-23°C) to 40°C should be considered. The optimal temperature 

for growth of E. coli is 37°C. Expression at temperatures lower 

than 3 7°C will give slower rates of cell growth and protein 
expression but, in some instances, may give acceptable rates down 
to room temperature. For example, the bacteriof erritin- IL- 3 fusion 

10 protein provided good expression levels for 3 hours of growth after 
induction at 23°C. 

Numerous modifications and variations of the present invention 
are included in the above -identified specification and are expected 
to be obvious to one of skill in the art. Such modifications and 

15 alterations to the compositions and processes of the present 
invention are believed to be encompassed in the scope of the claims 
appended hereto. 
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What is claimed is: 

1. A fusion gene comprising a first gene encoding a 
heterologous protein and a second gene encoding a carrier protein 
and wherein the carrier protein encoded by the second gene is one 

Of: 

5 (a) an B. coli protein having a predicted solubility 

probability of at least 90% based on a calculated 
value of CV-CV bar < -2.10 wherein: 



CV = K % 



N+G+P 



-+p+s \ 

n ) 



+ A. 



(R+K) - (D+£) 



-0.03 



n 



(I) 



10 



15 



20 



and 

wherein: 

CV bar is the discriminant =1.71; 

n = total number of amino acids in the carrier 
protein; 

N, G, P, S = the number of Asn # Gly, Pro and Ser 

residues , respectively; 
R, K, D, E = the number of Arg, Lys , Asp, and Glu 

residues , respectively; 
X a = 15.43 and X 2 = -29.56; and 
(b) a protein which differs from the protein of (a) only 
by having one or more conservative amino acid 
substitutions wherein the predicted solubility 
probability of the protein is not decreased below 
90%. 



A vector comprising the fusion gene of (a) of claim 1. 
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3 . A host cell transformed or transf ected with the vector of 
claim 2 . 



4 . The host cell of claim 3 wherein the host cell is E. 

coli . 

5. A vector comprising the fusion gene of (b) of claim 1. 

6 . A host cell transformed or transf ected with the vector of 
claim 5 . 

7. The host cell of claim 6 wherein the host cell is E. coli 
or another bacterium. 

8. The host cell of claim 6 wherein the host cell is a 
yeast, an insect cell, a mammalian cell or other eukaryotic cell. 

9. The fusion gene of claim 1 wherein the first gene and 
second gene are linked via a linker DNA sequence encoding a linker 
peptide . 

10. The fusion gene of claim 9 wherein the linker peptide 
encoded by the linker DNA sequence is cleavable. 

11. A vector comprising the fusion gene of claim 1 and an 
expression control sequence operatively linked to the fusion gene. 
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12. The fusion gene of claim l wherein the second gene is 
selected from the group of genes encoding E. coli protein RPSD, 

FTSY, AMY2, NUSA, GRPE, BFR, YJGD, YRFI , MAZG, and SSEB . 

13 . The fusion gene of claim 1 wherein the second gene is 
selected from the group of genes encoding E . coli protein YCHA, 

YAGJ, YFBN, NARJ, NARW, YECA, CHEZ, SLYD , YJAG, and YIEJ . 

14 . The fusion gene of claim 1 wherein the second gene is 
selected from the group of genes encoding E. coli proteins YGFB, 
YJDC, YCDY, AADB , FLAV, FLAW, YCED, YFHE , ASR, and YGGD . 

15 . The fusion gene of claim 1 wherein the second gene is 
selected from the group of genes encoding E. coli proteins YHBS , 

FTN, MENG, YBEL, S3 AD, SMG, HYCI, SECB, YBEY, and ELAA. 

16 . The fusion gene of claim 1 wherein the second gene is 
selected from the group of genes encoding E. coli protein YFJX, 

MIOC, HYFJ, RL16, RS6, YHHG, GCSH, TRD5 , MSYB, and RS12 . 

17. The fusion gene of claim 1 wherein the second gene is 
selected from the group of genes encoding E. coli protein RL7, 

YACL, YBFG, RL2 0, HYPA, PTCA, YZPK, HYBF, and FER . 
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18. The fusion gene of claim 1 wherein the second gene is 
selected from the group of genes encoding E. coli protein YR7J, 
YGGL, CYAY, YEHK, YR7G, YQFB, GLPE, YCCD, and RS14 . 

19. The fusion gene of claim 1 wherein the second gene 
encodes a protein wherein n is at least 100. 

20. The fusion gene of claim 1 further comprising at least 
one additional copy of the second gene. 

21. The fusion gene of claim 1 further comprising one or 
more copies of a third gene encoding a carrier protein like (a) or 
(b) , wherein the third gene is different from the second gene. 

22. A method of producing a soluble heterologous protein, 
comprising : 

culturing the host cell having a fusion gene as claimed 
in claim 1 under conditions causing the expression 
5 of the fusion gene therein; 

recovering the expressed fusion protein from the 
cultured cell in a soluble fraction thereof; and 

cleaving the heterologous protein from the fusion 
protein and obtaining the heterologous protein in 
a substantially purified and soluble form. 

23 . The method of claim 1 wherein the host cell further 
comprises an expression control sequence operatively linked to the 
fusion gene. 
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24. A fusion protein produced by the process of: 
culturing a host cell having a fusion gene as 

claimed in claim 1 under conditions causing the expression of the 
fusion gene therein; and 

obtaining a fusion protein formed from expression of 
the fusion gene in a soluble fraction of the cultured host cell. 

25. A fusion gene comprising a first gene encoding a 
protein and a second gene encoding a carrier protein. 

26. A fusion gene as claimed in claim 25 , wherein the 
carrier protein encoded by the second, gene is one of: 

(a) bxi .E. colt protein; and 

(b) a protein which differs from the protein of (a) 
only by having one or more conservative amino acid 
substitutions. 

27. A method of producing a protein , comprising: 
culturing a host cell having a fusion gene as 

claimed in claim 1 or '25 under conditions causing the expression 
of the fusion gene therein; 

recovering the expressed protein from the 
cultured cell. 

28. A fusion protein produced by the process of: 
culturing a host cell having a fusion gene as 

claimed in claim 1 or 25 under conditions causing the expression 
of the fusion gene therein; and 

obtaining a protein formed from expression of 
the fusion gene. 
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